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ABSTRACT 


The  objects  contained  in  a sequence  of  images  may 
be  tracked  from  frame  to  frame  by  defining  a comparison 
function  which  evaluates  the  difference  between  descrip- 
tions of  object  regions  in  adjacent  frames.  One  can 
then  apply  dynamic  programming  to  discover  the  most  tem- 
porally consistent  object  region.  Removing  all  descrip- 
tions of  this  region  from  all  freimes  allows  dynamic 
programming  to  be  reapplied  iteratively. 
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If  one  assumes  that  the  desired  objects  in  a scene  may 
be  extracted  by  thresholding  at  some  set  of  gray  levels, 
one  may  view  the  extraction  of  the  above-threshold  connec- 
ted components  as  a process  for  producing  candidate  object 
regions.  One  may  then  classify  the  candidates  into  object 
regions  and  accidents  (noise  regions  produced  by  threshold- 
ing) . The  Superslice  algorithm  uses  two  general  heuristics 
and  one  piece  of  user-supplied  knowledge.  The  first 
heuristic  demands  that  the  interior  of  a region  contrast 
significantly  with  its  surround.  The  second  heuristic  re- 
quires that  the  border  points  of  the  region  correspond  to 
positions  of  maximal  edge  detector  response.  Measures 
associated  with  these  heuristics  may  be  computed  as  the 
connected  components  are  extracted.  In  addition,  the  user 
may  control  the  false  alarm  rate  by  specifying  a size 
range  for  object  regions.  The  two  measures  and  the  size 
range  are  then  used  to  build  a classification. 

The  regions  which  survive  the  classification  process 
have  an  inherent  forest-like  structure.  Since  an  object 
may  be  extracted  by  thresholding  over  a range  of  adjacent 
gray  levels,  the  candidate  regions  corresponding  to  the 
object  can  be  ordered  by  containment.  The  containment  re- 
lation defines  the  forest-like  structure.  A sequence  of 
nested  regions  which  do  not  differ  much  in  size  and  shape 
may  be  considered  to  be  a set  of  "exemplars"  of  the 
object.  Not  all  regions  which  survive  the  classification 
step  correspond  to  objects,  however.  A certain  number  of 


accidents  tend  to  be  present  as  well.  All  regions  which 
do  survive  will  be  called  "candidate  object  regions". 

Other  statistics  besides  the  contrast,  edge  coincidence 
and  size  measures  are  computed  during  the  analysis.  These 
may  include  texture,  shape,  and  positional  information. 

The  frame  to  frame  tracking  process  will  use  these  features 
to  build  consistent  temporal  sequences  of  candidate  object 
regions. 


3.  Evaluating  candidate  object  regions 

There  are  two  issues  involved  in  finding  a best 
sequence  of  exemplars  for  an  object  by  choosing  one  ex- 
emplar per  frame.  First,  within  each  frame  we  wish  to 
select  the  "best"  among  all  exemplars  for  each  object. 
Second,  on  a frame  to  frame  basis,  we  wish  to  avoid  sudden 
changes  in  size,  shape,  position  or  other  descriptive 
features  associated  with  the  tracked  object.  Realizing 
the  former  goal  involves  defining  a figure  of  merit  so  that 
all  exemplars  of  the  same  object  may  be  compared  among 
themselves.  The  Superslice  procedure  provides  such  a 
figure  of  merit  based  on  the  object/accident  discriminant. 
Other  things  being  equal,  one  would  wish  to  choose  the  ex- 
emplar which  represents  the  underlying  object  most  closely. 
In  the  absence  of  specific  models  for  particular  object 
types,  the  general  requirements  of  good  contrast  and  good 
border/edge  match  are  appropriate.  In  the  example  to  be 
presented,  the  figure  of  merit  was  a weighted  sum  of  the 
three  features,  the  third  being  the  number  of  edge  points 
internal  to  the  region. 

Consider  a sequence  of  exemplars  for  a single  object 
corresponding  to  a range  of  thresholds.  Because  the 
sequence  is  nested  we  may  speak  of  the  "smallest"  ex- 
emplar, etc.  Assume  that  a "correct"  exemplar  is  known 
(say,  from  additional  ground  truth) . A "too  small"  ex- 
emplar will  tend  to  have  lower  contrast  (since  the  ex- 
terior neighbors  of  the  border  cells  will  in  fact  lie 
within  the  object  region)  and  lower  border/edge  match 
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(since  the  border  points  lie  behind  or  just  at  the 
shoulder  of  the  edge  ramp,  while  the  maximum  edge  response 
lies  along  the  middle  of  the  edge  ramp) . A "too  big"  ex- 
emplar will  exhibit  a similar  response  pattern.  However, 
for  the  exemplars  closest  to  the  correct  one,  the  features 
will  not  behave  consistently.  The  exemplars  just  larger 
than  the  correct  one  often  have  more  contrast  due  to  the 
(higher)  ratio  of  interior  points  to  border  points. 

Given  a figure  of  merit,  one  could  choose  the  best 
exemplar  from  each  sequence  in  the  forest.  The  corres- 
pondence of  exemplars  from  frame  to  frame  might  then  be 
made  by  some  simple  matching  procedure  or  by  some  modifi- 
cation of  the  procedure  to  be  described  below.  Tracking 
based  on  best  exemplars  runs  the  risk  that  the  best 
exemplar  of  an  object  in  one  frame  bears  little  resemblance 
to  its  best  exemplar  in  the  next.  This  may  be  due  to 
noise  which  afflicts  certain  frames  more  than  others.  The 
premise  here  is  that  an  object  does  not  change  character 
significantly  from  freime  to  frame  and  that  the  changes 
which  do  occur  should  be  smooth  rather  than  abrupt. 

Associated  with  each  candidate  object  region  is  a 
feature  vector  (along  with  the  figure  of  merit) . We  can 
measure  the  disparity  or  inconsistency  between  the  candi- 
date object  regions  by  computing  the  normalized  Euclidean 
distance  between  the  two  feature  vectors.  By  using 
several  features  to  define  a disparity  measure,  we  reduce 
the  sensitivity  of  the  method  to  gross  changes  with  re- 


spect  to  a single  feature.  As  with  the  figure  of  merit,  we 
may  weight  the  features  entering  the  disparity  computation 
according  to  their  freune  to  fraune  consistency.  This 


weighting  can  be  guided  by  the  semantics  of  motion  (e.g., 
in  plastic  deformation,  area  and  perimeter  will  remain 
roughly  constant,  but  second  order  moments  will  change) . 

For  the  example  we  investigated,  an  equal  weighting 
of  features  was  chosen. 
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The  dynamic  programming  model 

In  the  previous  section,  we  discussed  two  evaluation 
functions;  a static  evaluation  function  S(c)  defined  for 
each  candidate  object  region  c based  on  the  figure  of 
merit,  and  a dyncimic  evaluation  function  D(c,c')  which  for 
any  pair  of  candidates  defines  their  disparity.  Let 
Cj^,...,Cjj  = c be  a sequence  of  candidate  object  regions, 
ending  with  the  region  c.  We  define  the  total  cost  of  the 

N N-1 

region  c as  T(c)  = S(c.)  + I D(c-,c..,).  S(c)  is  de- 

i=l  ^ i=l  ^ 

fined  so  that  a perfect  exemplar  has  a score  of  0.  Simi- 
larly D(c,c)  = 0. 

Let  j = l,...,Nj^}  be  the  set  of  candidate  re- 

gions in  the  ith  frame,  i = 1,...,M.  We  define  the 
dynamic  programming  problem  as;  find  » i = 1,M} 

such  that  T(c..  ) is  minimum  over  all  selection  functions, 

IT.  The  solution  is  achieved  by  the  following; 


Basis  step;  TCc^^^)  = S(c^j);  j = 1,...,N^ 

Iterative  step;  T{c..,  .)  = S(c.  + min  {T(c4v.) 

i+lD  1+1]  k=1,...,N^  ^ 

+ D(c.j^,c.^lj)} 

for  j = 1, . . . 

The  above  procedure  finds  the  minimum  cost  sequence 


of  candidate  object  regions.  Candidate  regions  which  are 
accidental  are  unlikely  to  persist  from  frame  to  frame; 
thus  their  D terms  are  likely  to  be  large,  thereby  increasing 
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the  total  cost  of  any  sequence  which  includes  them.  Note 
that  there  will  be  many  sequences  which  are  only  slightly 
more  costly  than  the  minimum.  These  suboptimal  sequences 
will  be  based  on  other  exemplars  for  the  same  object. 

The  optimal  sequence  is  thus  optimal  for  the  particular 
formulations  of  S and  D.  Giving  more  weight  to  S and  less 
to  D will  tend  to  select  best  exemplars;  while  the  reverse 
weighting  will  tend  to  favor  frame  to  frame  consistency. 

Once  again,  a semantic  model  can  provide  guidance. 

In  general,  the  image  sequence  may  contain  more  than 
one  object.  The  scheme  described  above  identifies  the 
"best"  object  region  sequence.  In  order  to  extract  region 
sequences  corresponding  to  other  objects  in  the  image 
sequence,  we  must  delete  all  candidate  object  regions 
accounted  for  by  the  optimal  sequence.  The  inherent  data 
structure  specifies  which  regions  are  exemplars  for  each 
object.  By  deleting  all  candidate  object  regions  in  each 
frame  which  are  similar  to  the  selected  region  of  the 
optimal  sequence  (i.e.,  contain  it  or  are  contained  in  it), 
we  can  set  the  stage  for  another  application  of  dynamic 
programming.  This  process  is  repeated  until  only  very 
poor  (high  cost)  sequences  are  obtained.  Presumably,  at 
this  point  all  objects  have  been  accounted  for. 

Occasionally,  a deletion  step  may  leave  a particular 
frame  empty  of  candidate  object  regions.  This  may  occur 
for  two  reasons:  All  objects  were  accounted  for  by  the 

last  dynamic  programming  step,  or  the  candidate  region 
proposer  failed  to  elicit  an  exemplar  for  an  actual  object. 


hMftn  fift  ir  r rair^ 


In  the  former  case,  the  process  will  have  terminated. 

The  latter  case  can  be  handled  by  associating  a fixed 
"empty  frame"  cost  which  is  the  price  paid  for  skipping  a 
frame.  Of  course,  one  can't  know  which  case  applies.  The 
conservative  approach  is  always  to  assume  the  second  case 
and  apply  the  empty  frame  cost.  The  termination  criterion 
will  then  be  based  on  a threshold  for  the  total  cost, 
i.e.,  terminate  when  only  costly  sequences  remain. 

The  problem  of  an  object  leaving  the  field  of  view 
can  be  handled  in  a different  manner  by  flagging  candidate 
object  regions  which  lie  on  the  border  of  the  image.  A 
partial  sequence  whose  last  element  is  flagged  but  which 
overall  has  low  cost  can  be  accepted  as  depicting  an 
object  which  has  moved  off  the  image. 


The  dynamic  programming  algorithm  described  above  has 
been  implemented  and  tested  on  a sequence  of  ten  windows 
of  FLIR  data  containing  a tank  (Figure  1) . These  windows 
have  already  been  smoothed  by  a 3x3  median  filter  to  pro- 
vide better  response  to  thresholding.  The  Superslice 
algorithm  extracted  a modest  number  of  candidate  object  re- 
gions. Figure  2 displays  these  regions  (although  for 
nested  sequences  only  the  best  static  exemplar  is  displayed) 
Table  1 shows  the  feature  values  associated  with  each 
candidate  in  the  first  two  frames.  The  solution  to  the 
dynamic  prograimming  problem  was  computed  and  the  exemplars 
which  correspond  to  the  solution  are  shown  in  Figure  3. 
There  are  of  course  many  suboptimal  solutions  which  are 
quite  similar  to  this  one.  Their  cost  is  not  significantly 
greater  than  the  minimal  cost.  When  the  indicated  regions 
were  deleted  along  with  all  other  similar  candidates,  the 
only  remaining  regions  corresponded  to  noise  and  any 
minimal  cost  path  attempting  to  span  several  frames  was 
substantially  more  costly  than  the  optimal  path  or  any  of 
its  similar  suboptimal  paths.  It  seems  reasonable  to 
establish  thresholds  for  static  and  dynamic  cost  in  order 
to  prune  the  search  space.  More  sequential  data  bases  are 
needed  to  determine  the  extent  to  which  these  comments  are 


valid. 


Figure  1. 


A sequence  of  10  median  filtered  FLIR  windows  of  a tank. 


Figure  2.  Output  of  the  Superslice  region  proposing  algorithm. 


Figure  3. 


Optimal  sequenced  regions  using  dynamic  programming. 
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that  the  component  is  an  accident. 


2nd  Central  Moments 


the  disparity  computation 


6. 


Conclusion 


Objects  may  be  tracked  in  a sequence  of  scenes  in 
which  frame  to  frame  change  is  slight.  The  dyneunic  pro- 


grcimming  method  relies  on  the  heuristic  that  even  though 


some  motion  or  change  may  have  taken  place  in  the  scene, 
descriptions  of  the  same  object  tend  to  cluster  more 
closely  than  do  descriptions  of  different  objects.  Thus, 
a measure  based  on  similarity  and  consistency  can  provide 
a reliable  match  function  even  in  a dynamic  environment. 


After  an  object  has  been  tracked  consistently  through  a 
sequence  of  frames,  one  may  measure  its  motion,  deforma- 
tion, etc. 
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